297 research outputs found

    Reducing the Space Requirement of Suffix Trees

    Get PDF
    We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reductio

    Comparative genomics of Arabidopsis and maize: prospects and limitations

    Get PDF
    The completed Arabidopsis genome seems to be of limited value as a model for maize genomics. In addition to the expansion of repetitive sequences in maize and the lack of genomic micro-colinearity, maize-specific or highly-diverged proteins contribute to a predicted maize proteome of about 50,000 proteins, twice the size of that of Arabidopsis

    Efficient implementation of lazy suffix trees

    Get PDF
    Giegerich R, Kurtz S, Stoye J. Efficient implementation of lazy suffix trees. SOFTWARE-PRACTICE & EXPERIENCE. 2003;33(11):1035-1049.We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods. Copyright (C) 2003 John Wiley Sons, Ltd

    Efficient computation of absent words in genomic sequences

    Get PDF
    Herold J, Kurtz S, Giegerich R. Efficient computation of absent words in genomic sequences. BMC Bioinformatics. 2008;9(1): 167.Background: Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies. Results: We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp. Conclusion: The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data

    LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs).</p> <p>Results</p> <p>We have developed a software tool <it>LTRharvest </it>for the <it>de novo </it>detection of full length LTR retrotransposons in large sequence sets. <it>LTRharvest </it>efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of <it>LTRharvest </it>against a gold standard annotation for <it>Saccharomyces cerevisae </it>and <it>Drosophila melanogaster </it>shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of <it>LTRharvest </it>over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software.</p> <p>Conclusion</p> <p><it>LTRharvest </it>is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes <it>LTRharvest </it>a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.</p

    Multitrophic interactions among Western Corn Rootworm, Glomus intraradices and microbial communities in the rhizosphere and endorhiza of maize

    Get PDF
    The complex interactions among the maize pest Western Corn Rootworm (WCR), Glomus intraradices (GI-recently renamed Rhizophagus intraradices) and the microbial communities in both rhizosphere and endorhiza of maize have been investigated in view of new pest control strategies. In a greenhouse experiment, different maize treatments were established: C (control plants), W (plants inoculated with WCR), G (plants inoculated with GI), GW (plants inoculated with GI and WCR). After 20 days of WCR root feeding, larval fitness was measured. Dominant arbuscular mycorrhizal fungi (AMF) in soil and maize endorhiza were analyzed by cloning of 18S rRNA gene fragments of AMF, restriction fragment length polymorphism and sequencing. Bacterial and fungal communities in the rhizosphere and endorhiza were investigated by denaturing gradient gel electrophoresis of 16S rRNA gene and ITS fragments, PCR amplified from total community DNA, respectively. GI reduced significantly WCR larval development and affected the naturally occurring endorhiza AMF and bacteria. WCR root feeding influenced the endorhiza bacteria as well. GI can be used in integrated pest management programs, rendering WCR larvae more susceptible to predation by natural enemies. The mechanisms behind the interaction between GI and WCR remain unknown. However, our data suggested that GI might act indirectly via plant-mediated mechanisms influencing the endorhiza microbial communities

    Significant speedup of database searches with HMMs by search space reduction with PSSM family models

    Get PDF
    Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive

    Structator: fast index-based search for RNA sequence-structure patterns

    Get PDF
    Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator webcite.Deutsche Forschungsgemeinschaft (grant WI 3628/1-1

    Ergonomic design of user guides in multimedia environments with remote controls and onscreen displays

    Get PDF
    During a project period of three years a new type of remote control and onscreen display was developed after a process of compares and analysis of present remote controls and multimedia devices. This project was initiated by a German producer of consumer electronics. The usability and user acceptance was tested and added by questionnaires. The characteristic of this system is a remote control with only one control element and an according concertedly developed onscreen display. This new onscreen display is marked that the motion of the thumb on the surface of the sensor pad produces a conformable motion inside the display. All operational functions are integrated in 4 menus at both sides, top and bottom of the screen. The user testing had shown that haptic elements are well suitable to fulfil the requirements of supporting the user by imprinted user routines and avoidance of visual control of the usage
    corecore